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Abstract: We study the properties of adaptive walk performed by a mal- 
adapted asexual population in which beneficial mutations fix sequentially un- 
til a local fitness peak is reached. Here we consider three factors that govern 
the adaptation dynamics: the extreme value domain of beneficial mutations, 
initial distance to the local fitness optimum and the correlations amongst 
the fitnesses. We show that there is a transition in the behaviour of the 
walk length and average fitness fixed during adaptation when the mean and 
variance of the fitness distribution respectively become infinite. When the 
mean is finite, walk length decreases logarithmically with initial fitness but 
is a constant otherwise. We also find that the walks are longer for faster de- 
caying fitness distributions and correlated fitnesses. For fitness distributions 
with finite variance, the fitness fixed during initial steps does not depend 
on the fitness of the local optimum but increases with the local peak fitness 
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otherwise. Interestingly, the fitness difference between successive steps shows 
a pattern of diminishing returns for bounded distributions and accelerating 
returns for fat-tailed distributions. These trends are found to be robust with 
respect to fitness correlations. 
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The problem of adaptive evolution is challenging because advantageous 



mutations which are responsible for adaptation are rare (IEyre- Walker and Keightley 



20071 ). To get an insight into the nature of the rare beneficial mutations, one 



can appeal to the extreme value theory (EV T) for which a lot i s known if the 



1983 



fitnesses are independent random variables (IGillespieI . 
ever experiments sug g est th a t the fitness land s capes are correlated ( 



2010 



Miller et al 



2011 



SzENDRO et al. 



199lh. How- 



Carneiro and Hartl 



2OI2I ) and we know very lit- 



tle about t 



le statistics of the extremes when the fitnesses are correlated 



(IJain et al. 



2009 



Seetharaman and Jain 



2010 



Jain 



2011a). On the 



experimental front, while the distribution of benefici al effects has been di- 
rectly measured in several experiments on microbes 



ROKYTA et al 



2005 



Kassen and Bataillon. 



MacLean and Buckling 



2009 



2006 



Bataillon et al. 



Sanjuan et al 



ROKYTA et al 



2011 



2004 



2008 



SCHENK et al. 



20121 ). a complementary approach has been to study adaptive dynamics 
and several qu antities such a s the fitness rank of the mutant at the first 
adaptive step 



toKYTA et al 



ROKYTA et al 



2012), mean fitness fixed during adaptation (ISchoustra et al 



2011 



2009 



20051). the nurnber of adaptive substitutions 



ScHOUSTRA et al. 



2009 



GiFFORD et al 



2011 



20091: 



2011) and its dependence on the initial fitness ( IMacLean et al. 



SousA et al. 



SousA et al 



GiFFORD et al. 



2010 



GiFFORD et al. 



20121 ) have been measured. 



In this article, we study the mutational landscape model (IGillespieI . 



1983 



I991I ) of adaptatio n on a class o 



extreme value domains ( ISornette , 



fitne ss landscapes in which all the three 



2OOOI ) for independent random variables 
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can be accessed and fitness correlations ( Perelson and Macken 



1995|) can 



be varied. The mutational landscape model is defin ed in Kenotypic sequenc e 



space and assumes strong selection- weak mutation (IGillespie , 



1983 



19911) 



These condition s are met, 
early infection (Ida Silva 



tory ( jSousA et al. 



'or ex ample, in natural populations of HIV-1 in 



2OI2I ) and can also be designed in the labora- 
2012I ). For populations under strong selection, a benefi- 
cial mutation gets fixed quickly and if the probability of mutation is small 
enough that only single mutation per generation occurs, the population re- 
mains monomorphic at all times. Such a population performs an adaptive 
walk in which fitness increases at each step until a local fitness optimum is 
reached since double and higher order mutations are neglected. Here we focus 
on how the walk length and the fitness effects are affected by three factors, 
viz. the extreme value domain of beneficial mutations, initial distance to the 
local fitness optimum and the c orrelations amongst the fitri e sses. Building 



upon a for malism introduced in 



veloped in 



Flyvbjerg and LautrupI ( 119921 ) and de 



value theory (ISornetteI . 



Jain and Seetharamani ( 120111 ) , and using ideas from extreme 



2OOOI ) and a large deviation theory ( ITouchette 



2OO9I ). we analytically calculate various quantities of interest and verify our 



calculations using numerical s imulations. 



Several theoretical studies 



2011 



Neidhart and Krug 



Orr 



2011 



2002 



Jain 



2006 



Joyce et al. 



2008 



Jain and Seetharamani . 



201 Ibl ) on the statistical properties 



of adaptive walk have shown that the adaptation dynamics depend on the 
three extreme value domains given by WeibuU, Gumbel and Frechet distri- 
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butions. In this regard, our first broad conclusion is that within the Frechet 
domain, subregimes should be considered as the qualitative nature of the 
adapti ve walk depends on w hether the fitness distribution possesses finite mo- 



ments (IJqyce et al 



changes (jjAiN 



2008 



More precisely, the character of the walk length 



201 Ibl ) when the mean of the fit ness distribution bec omes 



20121 ) and 



infinite while the probability of parallel evolution (ISCHENK et al. 
the fitness fixed during the walk (this article) display different features when 
the variance ceases to exist. Our second conclusion pertains to adaptive walk 
properties that show a similar qualitative behavior across the EVT domains 
for both uncorrelated and correlated fitnesses. We find that the walk length 
decreases with increasing initial fitness (provided the mean of the fitness 
distribution is finite) and the selection coefficient also decreases during the 
walk. While this indicates that certain features of the adaptive walk are 
robust, they may be unsuitable for ascertaining the nature of the distribu- 
tion of fitness effects. Our third result concerns the adaptive walk properties 
that show a clearly different qualitative pattern a cross EVT domains. Using 



a blo ck model of correlated fitness landscapes (IPerelson and Macken 



19951 ). we show that the walk length decreases logarithmically with initial 
fitness (provided the fitness mean is finite) with a prefactor which increases 
with correlations in WeibuU and Gumbel domain but remains unaffected in 
the Frechet domain. A striking prediction of our analysis is that at the first 
few steps of the adaptation process when the population has ample bene- 



ficial mutations available, the fitness gains decrease in the WeibuU domain 
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but increase in the Frechet domain. This property is seen to hold for both 
uncorrelated and correlated fitnesses. Since the fitness benefits conferred at 
the early adaptation stage are accessible in experiments, we believe that this 
result provides a useful way to determine the extreme value domain of the 
distribution of beneficial effects. 

The plan of the article is as follows: we first describe the models used and 
a mathematical formulation in the next section. This is followed by a study of 
the average walk length and average fitness detailing their dependence on the 
EVT domain, initial fitness and fitness correlations. Finally we summarise 
our results and discuss their relevance to the experiments. 

MODELS AND METHODS 

Fitness landscapes: We study adaptation on rugged fitness landscapes 
with many local fitness optima that are sequences fitter than all of their 



one mutant neighbours. If the fitnesses are uncorrelated (jjAiN and KrugI . 



20071 ). the fitness / of a sequence does not affect that of the others and 
such a fitness landscape is generated by assigning to each sequence an in- 
dependent and identically distributed (i.i.d.) random variable chosen from 
a fitness distribution p{f). Adaptation occurs via rare benefici al mutations 



whose fitness lies in the upper tail of the fitness distribution flGlLLESPlE 



1983 



19911 ). Then according to extreme value theory for independent ran- 
dom variables, the distribution of the advantageous mutations can be one of 
the three extreme value distributions namely WeibuU, Gumbel and Frechet 
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(ISornetteL 120001 ). Following IJoyce et ail (12008[ 1. we choose the fitnesses 
from a generalised Pareto distribution (GPD) defined as 



p{f) = ii + Kfy 



l + K 



where k is the GPD exponent that can take any real value. The fitness is 
unbounded for k > and for k < 0, it has an upper bound m at — A 
nice feature of distribution ([T]) is that all three EVT domains can be accessed 
by tuning a single parameter k with /t < 0, — ?■ and > leading to WeibuU, 
Gumbel and Frechet distributions respectively. EVT also tells us that the 
typical local peak fitness for a sequence of len gth L and fitness distributed 



according to ([T]) is given by (ISornette 



20001 ) 



/ 



- 1 



K 



(2) 



For later reference, we note that the mean of the fitness distribution ([T]) 
is infinite for k > 1 and the variance for k > 1/2. We also study adaptation 
on correlated fitness landscapes which are generated using a block model 



(IPerelson and Macken . 



19951 ) in which a sequence of length L is split 
into B blocks of equal length Lb = L/B. The fitness of each block is an 
i.i.d. random variable chosen from ([1]) and the fitness of the sequence is the 
average of block fitnesses so that two sequences with one or more common 
blocks have correlated fitnesses. The fitness correlations can be tuned by 



changing the number of blocks with B = 1 and L producing completely 



uncorrelated and strongly correlated fitness landscapes respectively (IDasI . 



2010l ). We note that the fitness J'b (with /i = /) of the local fitness peak on 



correlated fitness landscapes is obtained on replacing L hj Lb in 



fE 



T 



(3) 



since it is the average of B random variables each of which is the best oi Lb 
rather than L random variables. Thus as the fitness correlations increase, 
the fitness of the local fitness maximum decreases. 

Adaptive walk model: We consider a finite population of asexually 
replicating binary sequ ences of length L evolving in the strong selection- 



weak mutation regime (IGillespie 



1983 



199ll ). Due to strong selection, 



a beneficial mutation is fixed quickly with a probability proportional to 
i ts selection coe fficient while the neutral and deleterious mutations get lost 



(IKimura 



19621 ). In the weak mutation regime, the average number of sin- 
gle mutants (sequences differing at a single locus from the parent sequence) 
produced per generation is smaller than one, and double and higher order 
mutants can be ignored. Under the above assumptions, the population re- 
mains monomorphic at all times and performs an adaptive walk on the fitness 
landscape with each step progressively fixing a higher fitness until a local fit- 
ness peak is reached. At each step of the adaptive walk on an uncorrelated 
fitness landscape, the population with fitness h m oves to a sequenc e with 



a higher fitness / with the transition probability (IGillespie 



1983 



1991 
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Jain and Seetharamani . 1201 If ) 



Tif ^ h) 



f-h 



T.g>h9-h 



(4) 



When L is large, we can approximate t he denominator on the righ t hand side 
(RHS) of @j by an integral to obtain 



Jain and Seetharaman 



201 If ) 



nf ^ h) 



U-h) pjf) 
£ dg {g - h) p{g) 



(5) 



Note that the integral in the above equation involves the mean of the fitness 
distribution and therefore (|5]) is valid for k, < 1. For k > 1 where the mean 
is undefined, some results have been obtained in 
briefly discussed in this work. 

Since in many experiments the pop ulation is founded using a single ances- 



JainI (j2011bl ) and shall be 



tor t 



lus keeping the initial fitness fixed (fScHOUSTRA et al. 



2009 



GiFFQRD et al. 



201lf ). we consider the adaptation process starting from a fitness /q. Then 
the probability distribution Vj{f\fo) that the population has fitness / at the 
Jth step of the adaptive wal k given that it started with fitnes s /o obeys the 



following recursion equation (jjAiN and Seetharamam 



20111 ) 



^J+i(/l/o)= 1^ dhT{f ^h){l-q\h))Vj{h\h) , J>0 (6) 
'/o 



where q{f) is the probability of having a fitness less than / and T{f 4— 
h) is given by ([5]). Equation simply means that the population moves 
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from fitness h to a. higher fitness / at the next step with probabihty ([S]) 
provided at least one fitter mutant is available, the probability of whose is 
given by 1 — q^{h). The cumulative probability q^{f) of the maximum value 
distribution is a smoothly varying function that increases from zero to one 
as the fitness / increases and belongs to one of the three EVT domains 
mentioned earlier. For fitness distribution ([T]), since the probability 



dg p{g) = 1 - (1 + Kfy 



-1/k 



(7) 



and the typ i cal fit ness maximum is given by ([2]), we find that for large L 



( jSORNETTE . 



2000|) 



where 



K < (WeibuU) (8a) 
K^O (Gumbel) (8b) 
K > (Frechet) (8c) 



, K 



(9a) 
(9b) 



Equation ([6]) can be used to write a second order differential equation in / 
for the distribution -Pj(/|/o) defined through Pj(/|/o) = p{f)Pj{f\fo) which 
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is given by (IJain and SeetharamanI . 120111 ) 



^;+i(/i/o 



IJ dg {g - f) p{g) 



PAflfi 



0) , 



J > 1 



(10) 



where the prime refers to a derivative with respect to (w.r.t.) /. For 
monomorphic initial condition with fixed fitness fo, we have the boundary 
conditions 



vAflfo) 



Sif - fo)Sj,o 

p(/o)(l-9^(/o)) 
//" dg {g - fo) p{g) 



(11) 
(12) 



Equation (fTTj) is self explanatory and ffT2D is obtained by applyin 



the first derivative of w.r.t. / (jjAiN and Seetharaman 



g (fTT]) on 



201l|) 



The main quantities that we are interested in are the average walk length 
and the average fitness at each step which can be obtained from the distri- 
bution Vj{f\fo) as described below. For sequences of length L, let Qj{L\fo) 
be the probability that the adaptive walk lasts exactly J steps. As the walk 
stops at step J if all the L neighbouring sequen ces have fitness lower than 



that of the currently occupied sequence, we have (IJain and SeetharamanI . 



201l|) 



QAMfo) 



dh q\h) Vj{h\fo 



(13) 



fo 



12 



When L is large, we can obtain the average walk length J from f ll3p as 

oo 

J(L|/o)^^jgj(L|/o) (14) 
j=o 

The average walk length J above for uncorrelated fitness landscapes can be 
utilised to find the corresponding quantity on correlated fitness landscapes. 
As described in Appendix [Xj the average walk length for a sequence composed 
of B blocks is equal to the sum of walk lengths of B sequences each of length 
Lb on uncorrelated fitness landscapes (see lA.Sp . In this article, we also study 
how the average fitness and selection coefficient at each step change during 
the walk. On uncorrelated fitness landscapes, the average fitness fixed at the 
Jth step for a given /o is defined as 

/j(/o)= Fdj fVjU\h) (15) 
A related quantity is the selection coefficient which at step J is given by 



sj = , J > (16) 

Jj-i 



and its average by 



oo 



sj= dss Sj{s\fo) (17) 

^0 

where 5j(s|/o) is the distribution of selection coefficient s at the Jth step in 
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the walk which can be determined using fll6p in (jS]) to yield 



SAs\fo 



df dh S [ s 

fo Jfo 



f-h 
h 



dh h T{h{s + l)^h){l- q\h))Vj.,{h\h) 



(19) 



/o 



In the last equation, the upper limit of the integral is obtained using the fact 
that the fitness / at the Jth step can not exceed u and the definition f lT6|) of 
the selection coefficient. 

Besides carrying out analytical calculations using the formalism described 
above, we have verified them and supplemented our results by extensive 
numerical simulations which are explained in detail in Appendix |Bl 

LENGTH OF THE ADAPTIVE WALK 



Transition in the behavior of walk length: If the mean of the fit- 
ness distribution p{f) is finite, the walk length increases with the length 
of the sequence but remains constant otherwise. To understand this tran- 
sit ion at /t = 1 , here we present a simple argument and refer the reader 
to IJaini (j2011bl ) for details. For k < 1 as the transition probability (|5]) is 



nonzero for finite fitness differences, the adaptive walk goes on indefinitely 
for infinitely long sequence or in other words, the adaptive walk length di- 
verges with sequence length L. A calculation for zero initial fitness and large 
L show s that the w alk cumulants increase logarithmically with sequence 



length f iJAIN 



201 Ibl ). In particular, the mean walk length J increases as 
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(INeidhart and KrugI . 120111 : Uain and SeetharamanI . 120111 : IJaini . 1201 Ibl ) 

InL (20) 



where 



(3 = l-^ , K<1 (21) 



which shows that the walks are shorter for slowly decaying fitness distribu- 
tions. For K > 1, the mean of the fitness distribution is infinite and the 
normalisation sum in the denominator on the RHS of (jlj) is dominated by 
the largest value / amongst L i.i.d. random variables (refer (|2])). This im- 
plies that the transition occurs to one of the highly fit sequences with fitness 
of order /. Since the number of such sequences is of order unity, the walk 
terminates in a few steps resulting in a constant walk length. As shown in 
Fig. [H a similar transition is seen at k = 1 when the sequence length is kept 
fixed and the initial fitness is varied. 

Dependence of the walk le ngth on initial fitness: We now generalise 
the calculation in 



Jaini ( l2011bl ) for zero initial fitness to discuss how the 
walk length changes with the initial fitness when k, < 1. We first introduce 
a generating function G{x, f) for the distribution Pj{f) defined as 



G{xJ) = Y,Pjif)^' ,^<1 (22) 
j=i 

Using the above equation in ffTOl) . we find that G{x, f) obeys a second order 



15 



ordinary differential equation in / given by 



G>./)- "'^-f|y» G(x./) (23) 



From fllip and fll2p . we obtain tlie initial conditions as 

G{xJo) = (24) 
]fj9 [9- fo) Pig) 

Equation fl23|) does not appear to be exactly solvable when L is finite but as 
described below, it is possible to extract useful information from (12 3 p when 
the sequence is infinitely long and using the fact that for a finite sequence, 
there is a characteristic fitness scale / given by ([2]). It is useful to consider 
(|23|) as a function of z defined in ([9]). If z=z{f), the general solution of the 
differential equation fl23|) may be written as 



G{x, z) 



aigi{x, z) + a2g2{x, z) ,z<z (26a) 
bihi{x, z) + b2h2{x, z) ,z>z (26b) 



where gi,hi satisfy ( l23l) and the constants ai,a2 are determined using the 
initial conditions ( l24|) . (!25|) at zq = z{fo) < z. The other constants of 
integration 6i, 62 can be found by matching the solution G(x, z) and its first 
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derivative (w.r.t. z) aX z = z: 



ai{zQ)gi{x,z) + a2{zQ)g2{x,z) = bihi{x, z) + b2h2{x, z) (27) 
ai{zo)g[{x, z) + a2{zQ)g'2{x, z) = bih[{x, z) + h2h:^{x, z) (28) 

Noting that z (refer ([H])) is constant in L and /o but Zq depends on them, on 
solving the above set of simultaneous linear equations, we find that 61, 62 are 
of the form 

bi = bii{x)ai{zo) + bi2{x)a2{zo) , i = l,2 (29) 

To find the properties of the walk length, we next define a generating 
function H for the walk length distribution f[T5]) as 



j=i 



H{x,L) = Y,Qj{L\U)x' (30) 
=1 

, , . dz 

'^(/o) 



/ dzp{z) -—-q\z)G{x,z) (31) 



To proceed further, we recall that the probability q^{f) — J- 1 for f ^ f and 
for f f. Then approximating q^{z) ioi z < z by zero, we get 

H{x, L) ^ dz p{z) ^ g^(^) G>(x, z) (32) 

where the subscript > is used to denote the quantities when z > z. In 
the above equation, the limits of integration are independent of L and /q. 
Then using and (129|) . we can extract the ^o-dependence of the generating 
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function as 



ai{zo)Ri{x) + a2{zo)R2{x 
H{x,L) = I K 



K^O (33a) 



where 



1 + Kf 



r-z{u) 

Ri{x) = j dz p{z) q^{z) ^ji^^^) 



Mzo)R,{x) + a2{zo)R2{x)) , « 7^ (33b) 



(34) 



is independent of L and /q. The exphcit expressions for ai and 02 are given 
in Appendix O From flC.51) and flC.7p , we see that 02 decays more rapidly 
than ai with L and therefore we may neglect the second term on the RHS of 



fl33al) and fj33bp for large L. Since the nth cumulant /i„ of the walk length is 



given by (ISornette . 



20001 ) 



rf" InH 



(35) 



where X = In x, to leading order in L, we have 



In ai 



K ^ 



x=o 



dX' 



-(In ai -ln(l + «;/)) 



(36) 
(37) 
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After some simple algebra, we finally obtain 



X=Q 



1 / 1 \ 

- (^In L - - ln(l + /./o) j ^^n^ + Ae^{l-K) 



K ^(88) 



K 7(33)) 



Setting n = 1 in the above equation, we find the average walk length to 



be 



J(L|/o) = /3(lnL-lln(l + /€/o) ) + c 



(40) 



where /3 is given by (12T]) and the constant c in which the subleading cor- 



rections are subsumed is determined numerical 
of 



Jain and Seetharamani (120111 ) and 



y. We check that the results 



JainI (l2011bl ) for /o = are repro- 



duced from the above equation. We also note that since the ran k m of a 



fitness (with the fittest ranked one) is given by (ISornetteI . 



2000) 



L 



m 



[1 + k/o)« 



(41) 



our result (l40l) gives J = 0\n m + c which has been previously obtained by 



Neidhart and KrugI (120111 ). Our numerical results for average walk length 
on uncorrelated fitness landscapes are compared with ( l40l) in Fig. [T] where 
the numerical fits for constants c for k = —1, and 2/3 are 1.15, 1.21 and 
1.55 respectively. We see a good match between the simulation data and (1401) 
except when the initial fitnesses are close to the local fitness optimum where 
the simulation data lies below the theoretical results. This discrepancy may 
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be due to the fact that the approximation q{f) = is good for fitnesses far 
from the local fitness peak while we have used it for all / < / to arrive at 



Dependence of the walk length on fitness correlations: In the 

above discussion, we have assumed that the sequence fitnesses are uncorre- 
lated. We now discuss how the walk length changes when correlated fitnesses 
generated using a block model (described in the last section) are considered. 
In the simplest situation, the initial fitness flf^ of each block may be assumed 



to be same. Using fn^-* = f . 



(IJain and Seetharaman 



for a ll b = 1, ...,B in flA.Sp , we immediately get 



201 ih 



JB{L\fo) = BJiLB\fo) (42) 



which shows that each block of length Lb contributes independently to the 
average walk length. However if the block fitnesses are random variables 
that satisfy (lA.ll) . an average over the joint distribution -Pb({/o^^}) of block 
fitnesses is also required. We thus have 

JB{L\fo) = fdj^^l.. r dfi''^PB{{f^^^})f2j^LB\fT) (43) 
Jo Jo 

Since the block fitnesses are i.i.d. random variables subject to the constraint 
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(lA.ip . the distribution of block fitnesses can be written as 



Mb{bu) 



(44) 



where Mb{X) is the normahsation constant given by 



r df mb-i{x - f) 

Jo 



(45) 
(46) 



with A/o(/) = ^(/)- Using ( l40ll and (j44j) in ( 143|) . we can express the average 
walk length as 



df p{f) ln(l + KfWB^,{Bfo - f) 



JB{L\fo) = B{f3\nLB + c) 



K 



AfBiBfo 



(47) 



We now analyse the above expression when the initial fitness /o is large 
(but ^ Jb). The distribution Mb{X) of the sum of B i.i.d. random variables 



can be found exact 



varia bles ((Feller 



y only for exponentially and uniformly distrib uted random 



2000l ) and by virtue of central limit theorem (ISornetteI . 



2000l ). it is a Gaussian in the region where the scaled variable 



X-B{f) 
V2Ba^ 



(48) 



is finite, provided the mean (/) and variance cr^ of the parent distribution 
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p{f) exist. Thus the distribution A/'b-i(-B/o — /) in dSD may be approxi- 
mated by a Gaussian if / hes in the neighborhood of -B(/o — (/)) but the 
domain of integration in flTTI) extends far beyond the region of the vahdity 
of central hmit theorem. Thus to calculate the integral in the numerator 
on the RHS of P7|) . we require the upper tail of the distribut ion AfniX) 



This can be obtained by applying a large deviation principle (ISornetteI . 



2000 



TOUCHETTE 



20091 ) explained in Appendix [D] for fitness distributions 



with all finite moments and using the result that the sum distrib ution decays 



as th e fitness distribution itself for heavy-tailed distributions (ISornetteI . 



2OOOI ). Furthermore, the integral on the RHS of P7|) can be rewritten as 
a differential of the distribution MsiX). To achieve this simplification, we 
consider a normalised distribution with support on the interval [0,m] defined 
as 

g{t) = K{a - + Kty (49) 

where a < 1,m = — 1/k for k < and a > !,« = 00 when k > 0. Then the 
distribution of the sum of B i.i.d. random variables chosen from g{t) is given 
by 



B / B 

) 



lB{X-a) = r dt^^\.. r dt^^^\\g{t^'^)b\X-Y,t^^ 



(50) 



Differentiating on both sides w.r.t. a, we get 



dlBiX-^a) BiBiX-a) „ /"^ , ,^ , ,,,, 

\ ' ^ = ^ ' -B dt g{t) ln(l + Kt)lB-i{X - t- a) (51 
oa a — 1 In 
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On dividing the above equation by IsiX; a), it follows that the average walk 
length (H7|) can be written as 



JB[L\h) = B{i3\nLB + c) 



K \a — 1 



da 



(52) 



a=l- 



Weihull class: If the block fitnesses are distributed according to a GPD 
with negative k, all the fitness moments exist and a large deviation theory 
(see Appendix D) can be used to find the sum distribution. Using ( ID. II) and 
(ID.7p . we have 



lB{Bfo;a) ~ e 



(53) 



where g is defined in (1D.4P and u^, is determined using flD.6p . The Laplace 
transform flD.4p of the distribution g{t) in fHOl) is given by 



a 



(54) 



and the function uj^:{fo) is a solution of the equation 



{a - l)uj\E^^i{7]) - EM) - V"^^\V + « - l)r(2 - a) 



ujK{{a - l)ujEaiv) + ?7°fi;r(2 - a)) 



fo (55) 



where r] = u/k, Ea{r]) is the exponential integral and r{n + 1) = n\ is 
the gamma function. The left hand side (LHS) of the above equation is a 
reflected-S shaped function that decreases from its maximum value — 1/k to 
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zero as is increased fror n minus infinity. Using the asympto tic expansion 
of tfie exponential integral ( IAbramowitz and Steguni . Il964l ). we find that 
the LHS of the above equation is given by 



K 



'"^ + (1 — a)uj^ ^ for to* —7- — oo (56) 



and 

uj~'^ for cj* — > oo (57) 

If the initial fitness /o is large (small), it equals the LHS of f lS^ when is 
negative (positive). Using fl56|) and fl57|) in f lD.7p for large and small initial 
fitnesses respectively, we find the rate function to be 

1 + ln((a - 1)k/o) + ln(l - a^/o) , /o < (1 - ^^05^) 

"■^■^^^ ^ l-a + {a-l)ln ( z^—^] + ln(r(2 - a)) , /o » (1 - 4^) 

V J- + «/o / 



which immediately gives Ib{X] «) due to the large deviations principle f ID.ip . 
For small /o, from fl5^ and flS5|) . we thus obtain 



Jb(L|/o) ^ 5(/31nLB + c)-5/3/o (60) 
= BJiLslfo) (61) 
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while for large /o, using ( 13^ in ( 13^ . we get 



JB{L\fo) ^ B{f3\nLB + c)~^{^ + H-K)+Hl + Kfo)+i'^'^\l-^m 



BJiLslfo) - —{k + ln(-A.) + - K-')) 



(63) 



where is the polygamma function. For /€ = —!, the equations ( l60i) 

and ( l62l) are compared against the numerical results in Fig. [2] and we see 
that the theoretical predictions match the simulation results quite well. 
Gumbel class: For exponentially dist ributed fitnesse s, the distribution 



MsiX) is known exactly for any B to be (IFeller 



20001) 



-X 



X 



(B-l) 



(64) 



Using this expression and taking the limit k — )■ in ( 147|) . we find the average 
walk length to be 



BCUfe-ffATs-iiBU-f) 



JB{L\h) 



B{-\nLB + c) 
B{]^\nLB + c) 
BJ{LB\fo) 



2 



(65) 

(66) 
(67) 



which is the same as in the case when each block fitness is /o (refer ( H2|) ). The 
above expression can also be obtained using the large deviations theorem in 
a straightforward manner. Figure |3] shows that the above expression matches 
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well with the numerical results. 

Frechet class: For k > 0, the (l/fi;)-th a nd higher moments diverge and 



the la rge deviation principle is not applicable (ISornette 



2OOOI : 



TquchetteI . 



20091) ■ In this ca se, the sum distribution fl50|) for large BJq is given by 



(ISornette 



2OO0I) 



lB{Bh-a)^Bg{Bh 



(68) 



whose tail behavior is same as that of the fitness distribution g{f). Using 
we immediately find 



m 



JB{L\fo) = 5(/31nLB + c)-^(ln(l + 5K/o) + «:(5-l)) (69) 

K 

^ B(/31nLB + c)--(ln5 + ln(fi:/o) + fi:(5-l)) (70) 

K 

= J{LB\h) + {B-l){l3\YiLB + c-P)-^\nB (71) 

K 



The above equation states that the average walk length increases logarith- 
mically with initial fitness but unlike in the Weibull and Gumbel domain, 
the coefficient of In /o does not scale with the number of blocks. In Fig. HI 
the above expression is compared with the simulation data for k = 2/3 in 
which instead of sequence length L, the block length is held constant. As 
a result, the walk length curves for various B are parallel and we also see a 
good quantitative agreement between theory and simulations. 

FITNESS EVOLUTION DURING ADAPTIVE WALK 



Transition in the fitness behavior: Given that fitness is /i at a step in 
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the adaptive walk, the probabihty that the fitness / is fixed in the next step 
is given by the transition probabihty T(/ h) oc {f — h)p{f) which favors 
large fitness differences. But as the fitness distribution p{f) is a decreasing 
(increasing) function of / for k > — 1 (k < — 1), the transition probability is 
nonmonotonic in / with the most probable fitness f* = 2 + nh for k, > —1 
but monotonically increasing for k < — 1. This property is refiected in the 
distribution Vj{f) (shown in Fig. [ST]for k = — 1/2 and —2) which peaks at 
higher fitnesses as /o increases for k > —1 while for k < —1, irrespective 
of /o, the most probable fitness occurs at the upper limit u of the fitness 
distribution. The average fitness fixed at the next step given by 



for K < 1 however displays a different dependence on the GPD exponent. The 
numerator on the RHS of the above equation contains the second moment of 
the fitness distribution which is infinite for k > 1/2. Moreover for k, < 1/2, 
the fitness difference 



has the important property that it increases with fitness h for positive k. For 
an infinite sequence in which fitter mutants are always available, these two 
results taken together suggest that the fitness jumps keep increasing during 
the adaptive process for fat-tailed fitness distributions. 




(72) 




(73) 
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Mean fitness during tiie walk: The average fitness fixed during adap- 
tation is defined in ( !T5|) and as shown in Fig. [5l it increases with the number 
of adaptive substitutions. Below we discuss how the average fitness changes 
during the walk and with initial fitness on uncorrelated fitness landscapes in 
more detail. 

Average fitness for k < 1/2.- As illustrated in Fig. when the number of 
adaptive substitutions J ^ J or the initial fitness is far from the local fitness 
peak, the average fitness fixed at the Jth step for a sequence of length L is well 
approximated by the correspond ing quantity for an infinitely long sequence 



(IJain and Seetharaman 



201 ll ). Taking the infinite sequence limit on both 



sides in and denoting the distribution Pj(/|/o) in this limit by $j(/|/o), 
we have 

<^>J+i(/l/o) = l^dh Tif ^ h) ^j{h\h) , J > (74) 



From the above equation, we al so note that for infinitely long sequences 



( Ijain and Seetharaman 



20111 ) 



df $j(/|/o 



(75) 



/o 



which states that the adaptive walk goes on indefinitely for an infinitely long 
sequence. Multiplying both sides of ( 1741) by / and integrating over it, we 
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obtain 



/j+i(/o) = r df f ^j+,{f\h) (76) 

Jfo 



/o 

where we have interchanged the order of integration to arrive at the last 
equation. As the numerator in the integrand contains the second moment of 
the distribution p{f) which is undefined for k > 1/2, the above equation is 
vahd for n < 1/2 only. On performing the integrals in ( 177|) . we have 

/j+i(/o) = r dh <l>j{h\fo) (78) 

2 , fAfi 



2k I -2k 



(79) 



where we have used flTSl) to arrive at the last equation. The solution of the 
above equation is given by 



lAk) = \ il-2Kr'fo (80) 

K 



■'^ ' '1-2k)--^ -- (81) 



K K 



Joyce et al. 



(120081) 



The above result for zero initial fitness matches Eq. 33 of 
for high initial rank. Equation (jSlj) predicts that the final fitness u is ap- 
proached exponentially for bounded distributions but the fitness increases 
with the number of substitutions linearly for exponentially distributed fit- 
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nesses and exponentially for unbounded distributions with < k < 1/2. 

It is useful to consider the fitness improvement A/j during the successive 
steps defined as 



A/j = fj- fj-i (82) 

where the overbar represents averaging over only those walks that reach the 
Jth step. For infinitely long sequences, as the Jth step is definitely taken 
(see fl75|) ). we have 

Afj = fj- fj_, = 2(1 + Kfo){l - 2^)-^ (83) 



A sim ilar expression for fitness effects has been obtained by IJoYCE et al 



( 120081 ) but its consequences were not discussed. The above result has also 



been obtained for the special case of exponentially distributed fitnesses and 



zero initial fitness by 



Kryazhimskiy et al. 



(120091 ). For fixed initial fitness, 



( l83l) shows that for k < 0, the fitness benefit decreases exponentially as the 
walk proceeds {diminishing returns) while for k — 0, the average fitness in- 
creases linearly during successive steps conferring constant benefit [constant 
returns) and for < k < 1/2, the fitness difference increases exponentially 
fast with each step conferring higher benefit than the previous one [acceler- 
ating returns). Similar qualitative trends are seen when the initial fitness is 
varied since the fitness gain changes linearly with /q and the sign of the slope 
changes as k crosses zero. In Figs. |6] and U\ the simulation results and the 
above theoretical prediction (j83l) for infinitely long sequence are compared 
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and we see a good agreement when the local fitness maximum is far away. We 
have also measured the standard deviation about the mean fitness fixed and 
our simulations indicate that as the walk proceeds, the fluctuations increase 
for K > but remain almost a constant for k — )■ and decrease for k < 0. 

Average fitness for 1/2 < k < 1: When the second moment of the fitness 
distribution becomes infinite, we work with a sequence of finite length to find 
how the average fitness diverges with L. Since the adaptation process is over 
when the fitness fixed is of the order of the fitness of the local fitness opti- 
m um, we trun c ate th e fitness distribution ([T]) at the local fitness maximum 
200ol ). From the definition f|T5|) and recursion equation ([6]), 



/ (ISqrnetteI . 



we have 



/j+i(/o 



/o 



df f Pj+i(/|/o) (84) 
^dhil-g\h))VAh)^^Ml^^ (85) 



/o 



Ih dg{g - h)p{g) 



For large but finite L, we find that 



jjdf f {f-h) p{f) ^ 2 + h ^ (l-K)(l + /t/i)- 
Jl dg (g - h) Pig) ^ 2«: - 1 {2k-1)k^ 



L 



2k-1 



(86) 



where we have neglected terms that decay faster than L^" ^. We check that 
for K < 1/2, we recover the result in fl78|) . Using the above expression in 
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dHS]), we get 

2 + /, ff 



/.+i(/o) - + T /' dh ^j{h) (1 + z./.)^ (87) 



At the first step, using ( ITTI) . we immediately get 



2 + /o (l-^)(l + ^/o)-. 
= ~2^1 + «:2(2«:-l) ^ ^^^^ 

which shows that for large L, the average fitness /i scales as L^'^~^. Beyond 
the first step, an exact expression for ^j{h) is not available but if we assume 
that this distribution decays faster than h^^^'^, one can replace the upper 
limit in the integral on the RHS of fl57|) by infinity to get 



/j+i(/o) ^ + Aj{k, fo)L'--' (89) 



where Aj{K,,fo) represents the resulting integral and the other prefactors. 
The above equation thus suggests that the fitness at the Jth step increases 
with the sequence length as L?''^~^ which is supported by numerical simula- 
tions shown in the inset of Fig. H] for k = 2/3. The fitness difference between 
successive steps can also be read off from Fig. |4] (also see Fig. [S2|) and we 
find that it increases as in the case when < k < 1/2. 

Mean selection coefRcient during the walk: Like average fitness /j, 
the average selection coefficient sj is also undefined for k > 1/2. To see this, 
consider the distribution \l/j(s|/o) of selection coefficient for an infinitely long 
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sequence. Taking the limit L — )• oo on both sides of (HM . we obtain 



— p[his + l)] 



^j{s\fo) = {1-k)s dh f^^—^h'<!>j.,{h\fo) (90) 

Jfo (1 + Kh) « 

The average selection coefficient (|T71) in the infinite L limit is then given by 
sAfo) = r dh /^""^Itl '^^^1(^1/0) ' dss'p[h{s + l)] (91) 

J/o (1 + Kft) « Jo 

As the inner integral over s in the last equation is undefined for k > 1/2, we 
get 

Uh) -T^^ ^T^Jl f *.-.('=l/o) . « < V2 (92) 

Note that while the expectation value of fitness h is involved in the expression 
(!78|) for average fitness, the average of I//1 appears in the above equation. 
Using the inequality that the arithmetic mean is always greater than or equal 
to the harmonic mean, we have 



> + (93) 



2k(1 + k/o) 



'l-2K){l + KU)-{l-2Ky 



(94) 



where we have used flHT]) . Although one can thus find a lower bound on 
the average selection coefficient, to obtain an expression for sj, we need 
the distribution $j(/|/o) which is available only for some special cases and 
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we now discuss them. At the first step in the walk, using ( ITT]) in ( pUj) . we 
immediately have 



^1(^1/0) = "yS'. Piis + l)/o) ,l + n{s + l)/o > (95) 

(1 + k/o) — 

which is a nonmonotonic function of s for k > — 1 but increases monotonically 
for /€<—!. The above equation also gives 

--i(/o) = ^-^^Y^ ' " < 

which decreases as the initial fitness increases (see Fig. [5]). We also mention 
that if the above distribution ( 195|) is averaged over the initial fitness also, the 
mean selection coefficient diverges for all k < 1 (see the discussion in [S3]). 
For exponentially distributed fitnesses, o n solving the differential equatio n 



TOl) in the infinite sequence length limit (IJain and Seetharaman 



we find that the distribution $j(/|/o) is given by 



<^>i(/l/o) = ^2J-1)\ ^^^^ 



thus leading to the distribution of selection coefficient as 

= (TtI^ f^^^ + + " + 2J - 2] (98) 
The above result for an infinitely long sequence compares well with the sim- 
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ulation results for a finite sequence sliown in the inset of Fig. 121 We find that 
the distribution is nonmonotonic in s and decays faster with increasing J or 
/o- From we also obtain 



(2J-3)! 



dy 



y 



2J-3 



y + fo 



2e^«E2j_2(/o) , J>1 (99) 



where En{x) is the exponential 



integral function. Using the 



pansion of exponential integral (IAbramowitz and Stegun 



asymp totic ex- 



1964h . we find 



that sj ~ 2//o for large /o and therefore the selection coefficient decreases 
with initial fitness. For larg e J, we have sj ^ 1/J on u s ing th e represen- 



tation of En{x) for large n (IAbramowitz and Steguni . Il964l ). Thus the 



mean selection coefficient decreases with increasing initial fitness and during 
the course of the walk. These qualitative properties are also exhibited when 
fitness distributions with GPD exponent in other extreme value domains are 
considered as illustrated in Fig. [5l 

Effect of correlations on fitness evolution: We now discuss how the 
fitness fixed during evolution behaves on correlated fitness landscapes. For 
a given set {/q''^} of initial block fitnesses, the average fitness fixed when the 
sequence is partitioned into B blocks is given by 



J,B 



1 ^ 

B ^ 

1=1 



Z^fc^i J j-i 



B 



(100) 



where the overbar represents the averaging w.r.t. the distribution 'Pj(/j^i, f^j \ /j^iK/o"''}) 



(.(f)) ^ 
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that at step J, an adaptive substitution occurs in the ith block (and fji = 
fj). Assuming that the distribution Vj{fj^}^, fj\ fj^i\{fQ^}) can be 
factorised over the blocks, using the result fl75p for a long sequence in fllOOp . 
we find that 

i=l i=l 

= ^'~J'~' +fj-i (102) 

where the last equation is obtained on averaging over the initial block fit- 
nesses under the constraint ( lA.ll) and fj is the average fitness fixed at the 
Jth step on uncorrelated fitness landscape. At the first step in the walk, 
using ( IHTl) in f ll02p . we immediately get 



which states that the fitness difference at the first step on correlated fitness 
landscape is 1/5 times the fitness difference at the first step on uncorrelated 
fitness landscapes and depends linearly on the initial sequence fitness. How- 
ever our simulation data for the fitness difference at the first step shown in 
Fig. [8] does not agree with the above expectation except for exponentially 
distributed fitnesses which suggests that the factorisation property for the 
distribution Vj{ff}i, fj \ fj^^llfll^^}) does not hold in general. For 
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K ^ 0, due to (I1U2P and ([HI]), we have 



r _ r 2J 
JJ,B ~ •^0 + "g" 



(104) 



At first few steps, the above expression is consistent with the simulation 
results as can be seen in the inset of Fig. [HI For n ^ 0, although the fitness 
difference does not obey (I103|) . the trend with k, shown in Figs. [HI and [S3[ is 
similar to that in the uncorrelated case where the fitness difference increases 
and decreases for k > and k < respectively. 

DISCUSSION 



In recent years, the adaptive walk mo del ([GillespieI. [l983l. Il991l) has 



been a subject of many theoretical works 



2008 



2011 



Kryazhimsk 



Jain 



2011b 



Y et al. 



2009 



FiLHO et al 



QrrI. 



2002 



2006 



Jain and Seetharaman 



Joyce et al 



2011 



Neidhart and Krug 



20121 ). The model is attractive for several 



reasons: it is simple to define, its properties depend on a rather small number 
of para meters and moreover, some of its predictions are supported by exper- 



iments ( RoKYTA et al. 



2005[ ). This model is expected to work well for small 
populations as it assumes the number of mutants produced per generation to 
be small. However even for a large adapting pop ulation in which many com - 



2003), 



peting beneficial mutants are initially produced (IDesai and Fisher . 
we expect it to be applicable in the late adaptation regime when the popu- 
lation has access to relatively few beneficial mutations. 

In this article, we have investigated how the statistical properties of the 
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adaptive walk depend on the tail behavior of the fitness distribution, cor- 
relations amongst fitnesses and the initial fitness. The exponent k in the 
fitness distribution ([T]) which determines the nature of the distribution of 
beneficial fitness effects has been measured in experiments and interestingly, 



all the three EVT domains have been seen. A 
supported the Gumbel domain 



Kassen and Bataillon. 



(IRqkyta et al 



2008 



2006 



Sanjuan et al 



thouE 



2004 



MacLean et al 



Bataillon et al 



1 many early studies 



ROKYTA et al 



2005 



20101 ). recently WeibuU 



20111 ) and Frechet domain (jScHENK et al. 



2OI2I ) have also been documented. Our theoretical analysis based on 



and ([6]) spans the three EVT domains but requires that the fitness distribu- 



tion has a finite mean 



K < 1). Experiments s u 



scapes are correlated ( ICarneiro and Hartl 



ggest that the fitnes s land; 



SzENDRO et al 



2OIOI : 



Miller et al. 



2011 



2OI2I ) but we know little about the fitness correlations quan- 



titatively. Here we have studied the ad aptive walk properties on correla ted 



fitness landscapes using a block model (IPerelson and Macken 



19951 ) in 



which a se quence is assuined to be composed o 



periments (IROKYTA et al 



2009 



GiFFORD et al. 



many blocks. Recent ex- 



2011 



SousA et al 



20121) 



have also sought to understand how adaptation dynamics depend on the ini- 
tial fitness. As our analytical formulation is based on the assumption that 
a large number of novel mutants are available, the formulae obtained here 
hold for moderately high initial fitnesses that are far from the local fitness 
optimum. Our theoretical analysis is verified and supplemented with numer- 
ical simulations that also cover parameters outside the range of the validity 
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of analytical results. 

Summary of results and relation to experiments: In this article, 
we studied the adaptive walk length which gives the number of substitutions 
until the population reaches a fitness peak and the evolution of fitness and 
selection coefficient during this process. As shown in Fig. [H the walk length 
changes its qualitative behavior at GPD exponent k, = 1 when the fitness 
distribution ceases to have a finite mean. For n < 1, the walk length depends 
on the initial fitness and as one may expect, the number of adaptive steps 
decrease as the initial fitness increases. But for k > 1, the population jumps 
to a sequence with fitness of the order of the local fitness peak with no 
memory of the initial fitness. The adaptive walk length h as been seen to 



decrease with increasing initial fitness in some experiments ( IRokyta et al. 



2009 



SousA et al 



20121) and it s insensitivity to initial fitness has also been 



observed (IGifford et al. 



201X1 ). As illustrated in the inset of Fig. [H the 
mean fitness fixed also shows a transition as a result of the diverging second 
and first moment of the fitness distribution at k = 1/2 and 1 respectively. 
For n < 1/2, as the RHS of ( !89l) becomes independent of the sequence length 
(and hence /) for large L, the fitness fixed during the initial steps in the walk 
depends on the initial fitness but not on the local peak fitness. However for 
1/2 < K < 1, the fitness fixed depends on both the initial fitness and the local 
peak fitness. From ([2]) and fl89l) . we see that the fitness fixed increases as 
which scales sub linearly with / for k < 1. Finally for k > 1, it depends only 
on the fitness of the local optimum as the population immediately jumps to 
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a fitness close to the local optimum (see inset of Fig. [T]). It should be possible 
to test these predictions in experiments by measuring the fitness fixed at the 
first few steps for two populations with different wild type fitnesses. 

As Fig. [5] shows, the fitness fixed during the adaptive walk increases with 
the number of substitutions and initial fitness in all EVT domains but the 
fitness difference between successive steps depends on how fast the fitness 
distribution ([1]) decays (refer Figs. O [71 and [S2|) . In the Weibull domain, 
the fitness benefits decrease as the walk proceeds or the starting fitness is 
increased leading to long walks for bounded fitness distributions. In con- 
trast, in the Frechet domain, each adaptive step leads to an increasing fit- 
ness gain as a result of which the population quickly exhausts the supply of 
beneficial mutations and the walk is short. This behavior of fitness gain is 
robust with respect to fitness correlations as attested by Figs. [8] and [S3l To 
understand these trends in the fitness gain, we note that due to ( 1731) . the 
fitness difference fixed is related to the ratio of the second moment to the 
first moment of the fitness distribution. As the second moment and hence 
fluctuations in fitness increase with k, fitnesses distributed according to fat- 
tailed distributions can be large and are likely to get fixed since large fitness 
differences are favored by the transition probability ([5]) thus leading to the 
pattern of accelerating returns in Frechet domain. Nega tive correlation be 



tween initial fitn e ss an d fitness gain has been observed ( IBull et al. 



2000 



MacLean et 



been seen in ( 



al 



20101) and increasing fitness gain in successive steps has 



BURCH and Chaq , 



19991 ) for small populations. 
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If the mean of the fitness distribution is finite, the walk length decreases 
logarithmically as initial fitness increases (see Fig. [1]). The log-dependence 
may be understood using the fact that the walk terminates when the fitness 



fixed fj given by f lHTj) is of the order of the local peak fitness / (jjAiN and Seetharaman . 



201ll ). However to obtain the correct prefactor /3 which encodes the de- 



(Neidhart and Krug. 


2011: 


Jain. 


2011b 


) . The effect of fitness correlations 
) but the logarithmic relationship 


is to increase the walk length 


(Orr 


, 2006 



between initial fitness and walk length still holds as supported by Figs. I2H1 
However the coefficient of the log fitness in the expression for walk length 
scales with correlations in WeibuU and Gumbel domains but is indepen- 
dent of fitness correlations in Frechet domain. In experiments measuring the 
walk length, the walk is assumed to terminate if the fitness remains constant 
over some time period but that need not imply that the adaptation i s over 



( IROKYTA et al 



20091 ). Besides most experiments (IGifford et al 



20111 ) 



cannot measure mutations whose selection coefficient is below a threshold 
value and miss out on mutations conferring slight benefit thus underestimat- 
ing the walk length. For these reasons, a quantitative comparison between 
experiments and theory (in which the local fitness peak is known) seems 
difficult. 

For K < 0, since the fitness difference is a nonincreasing function of J 
and /o (refer (l83l)). one may expect the selection coefficient ( IT6l) to decrease 
with increasing J or /q. For > 0, although the fitness benefit increases, 
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the selection coefficient still decreases as shown in Figs. 151 and IS4b. Moreover 
as Fig. [S4b shows, the qualitative behavior of the selection coefficient is 
unaffected by correlations amongst the fitnesses. These results suggest that 
selection coefficient may not be a useful quantity to measure if one wishes 
to obtain an insight into the nature of the benefi cial mutations and fitness 
correlations (also see 



Kryazhimskiy et al. 



(120091 )). Our theoretical results 



described above are consistent with the experimen t al stu dies on Aspergillus 



nidulans ( ISchoustra et al. 



20091 



GiFFORD et al. 



20111 ) in which the mean 



selection coefficient is observed to decrease as the walk proceeds. In a recent 
study on Escherichia coli in which the initial fitness was varied by i ntrodu cing 



2OI2I), for 



deleterious mutations with different fitness costs (ISOUSA et al. , 
poorer initial condition, the mean selective effect is found to be larger and 
the distribution of selective effects broader. These trends are also consistent 
with our results in the inset of Fig. [3] and Fig. [5l 



Our theoretical result s may be use: 



tal data. For example, in 



SousA et al 



ul in 



urther analysing the experimen- 



(120121 ). since the local fitness optimum 



to which the populations approach is fixed, a truncated fitness distribution 
is expected. It would be interesting to check if our prediction that the fitness 
difference decreases with increasing initial fitness for bounded distributions 
is supported by the fitness data in this experiment. We close this article with 



a dis cussion of a recent experiment on Aspergillus nidulans (IGifford et al. 



20 111 ) in which the average walk length was found to be short (about two 



steps) and insensitive to the starting fitness. The latter result is rationalised 
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by the authors by noting that the size of the selective effect at first step in- 
creased with decreasing initial fitness. However as discussed above and shown 
in Figs. IS] and [54] this is a general trend that holds across EVT domains but 
the walk length is not insensitive to initial fitness for all /t. Thus the weak 
dependence of walk length on starting fitness is not explained by increased 
mutational size and we expect that the GPD exponent may be greater than 
unity in this experiment. Insensitivity of walk length to initial conditions 



and decrease in fitness gain are also expec ted for any k if t 



too close to t 



le local fitness opti mum. In 



experiments (ISOUSA et al. 



GiFFORD et al 



le po pulation is 



teoilh and other 



2OI2I ). the distance to the fitness peak has been 
gauged by the ratio of initial fitness to that of a (known) local fitness op- 
timum. But it is important to note that the initial rank which gives the 
number of b etter mutants available at the start of the adaptation process 



( IQrr 



20021 ) is not a linear function of the ratio /o// due to ( 1411) . For a 
sequence of length L = 10^, the fitness / is given by ([2]) and an initial fitness 
which is half of that of the local fitness peak will have a rank 500, 32, 4 for 
K = —1,0, 1/2 respectively. Then in the absence of information about the 
GPD exponent, it is not clear how far the population is from the local fit- 



ness optimum. Measurement of walk length for i nitia" 



those used in the experiment of 



GiFFORD et al. 



fitnesses smaller than 



(1201 if ) and of fitnesses fixed 



during evolution may help in understanding the properties of adaptation in 
A. nidulans. 
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A Average walk length on correlated fitness 
landscapes 



In bl ock model of correlated fitness landscapes (IPerelson and MackenI . 
19951 ). the fitness of a sequence which is divided into B blocks is the average 
of the block fitnesses. Thus if the initial fitness of the 6th block is /q''\ the 
initial fitness of the whole sequence is given by 



1 ^ 

b=l 



(A.l) 



Similarly if is the number of adaptive substitutions in the bth block which 
occurs with a probability Qm,,{LB\fo'^), the length of the adaptive walk is 



Jb = J^rf-ifTih. Using the f act th at the block fitnesses evolve independently 



(IPerelson and Macken 



19951 ). we find that t he average walk length on 



correlated fitness landscapes can be written as (jjAiN and SeetharamanI . 
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20 111 ) 



ML\fo) = J^J^Em,- J) J]Q„^,(L5|/, 



J=l 6=1 

oo oo 



B 



^...^(mi + ... +mB) 



(fe)^ 



(A.2) 



(A.3) 



nil "ITT-B 
B oo 



6=1 



6=1 mi,=l mi=l 
B 



(A.5) 



6=1 



where J(Lb|/o*'') is the average walk length for a sequence of length Lb with 
initial fitness /q''^ on uncorrelated fitness landscapes. 

B Simulation procedure 



On uncorrelated fitness landscapes, the initial fitness is fixed at /o and at 
every step, L new fitnesses are generated from the fitness distribution ([1]). 
The walk proceeds by moving from current fitness h to a higher fitness / 
which is chosen according to the transition probability (jl]). The fitnesses 
sampled in the previous steps are not stored as for large L, the number of 



one mutant neighbors sampled in previous steps can be ignored in compariso n 



to L ( Or 



2002 



Flyvbjerg and Lautrup 



1992 



Seetharaman 



20111). 



The walk stops when all L fitnesses have a lower value than the current fitness. 
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The process is iterated 10^ times and the mean walk length calculated. The 
fitness and selection coefficient of each step are averaged over only those 
walks that proceed until that step. As this number is equal to or lower than 
the total number of iterations, the ffiness and selection coefficients at later 
steps are averaged over smaller number of iterations. 

On correlated ffiness landscapes with i? = 2, to fix the initial ffiness of 
the sequence at /o, two random variables are generated independently from 
p{f)- If the sum of the two random variables is 2/o ± 6 where 6 ~ O.Ol/o, 
they are accepted as block ffinesses else the process is repeated. At each step, 
Lb new ffinesses for both the blocks are generated and the walk proceeds by 
choosing one of the 2Lb ffinesses as per the transition probability The 
chosen new ffiness changes only the ffiness of the corresponding block and 
the other ones remains the same as in the preceding step. This process is 
continued until all the Lb ffinesses in both blocks has a ffiness lower than 
the current ffiness at which stage the walk terminates. In this case, the data 
were averaged over 10^ iterations. 
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C Solution of the generating function for in- 



finite sequence 



For infinitely long sequences, the cumulative probability distribution q^{h) — )■ 
and the differential equation (!23|) reduces to 



(C.l) 



which can be easily solved ( IBender and OrszagI . Il999l ). Using ([24]) and 
(l25l). we find the solution to be 



GixJ) 



x(l-/t)(l + /t/o)^/" 
V^K^ + 4:x{l - k) 



1 + fl + Kf 



(C.2) 



where 



1 / ^ , I 4x(l - k) 
a± = - I 1 ± 1 + ' 



(C.3) 



In terms of z defined in (|9]) , the solution (1C.2P for k 7^ can be written as 



G{x, z) 



x{l- k)(1 + /./o)V- 

^yK^ + 4:X{l-K) 



, 1 + /^/ 



(C.4) 
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Comparing the above equation with fl26ap . we get 



x(l-/t)(l + /t/o)V- / 1 + Kf 



^/k'^ + 4x(l - \l + ft:/o^ 
x(1-k)(1 + /./o)V- / 1 + Kf 



For exponentially distributed fitnesses, taking the limit k — in ( 1C.2[) and 
using we find that 



^(x, z) = f e^V^e^^'-^o)^ - e-'^e-^^~-f°^^) (C.6) 



from which we obtain 



a, = ^e^f~fo)V-^ (C.7a) 
a, = _ v^g-(/-/o)VS (c.7b) 



D Large deviations theory 



According t o the large deviation s principle, the distribution Ib{,X) in (l50l) is 



of the form (ITouchette 



20091), 



Ib{X ~ Bx) ~ e-^"(^) 



(D.l) 
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where the rate function r(x) can be determined as described below. For 
independent random variables chosen from a common distribution g(t), the 
distribution of the sum of B i.i.d. random variables is given by fISU]) . On 
using the integral representation of the Dirac delta function, we get 



2tt 
1 

2Tn 



dk e 



ikX 



oo 
ioo 



dy e-^'ygiy) 



dcu e^^%~giuj)f 



(D.2) 
(D.3) 



where g{uj) is the Laplace transform of the distribution function g{t) defined 
as 

POO 

~giu) = / dt git) e--* (D.4) 



Evalu ating the RHS of ( 1D.3I) using a saddle point method ( IBender and Qrszag 
19991 ). we get 

Ib{X) ~ e^'^*^+^'°^('^*) (D.5) 



where the saddle point is real and given by 



d\ng 



du 



-X 



(D.6) 



Then using the large deviation principle (ID.ip on the LHS of (ID.Sp . the rate 
function is obtained as 

r{x) = u^x + ln^(a;*) (D.7) 
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(l/K)ln(l+KfQ) 

Figure 1: The main plot shows the average walk length when the initial fitness 
is varied on uncorrelated fitness landscapes for various n and L = 1000. The 
simulation data are shown by points and the lines show (140|) for all k < 1 
and serve as a guide to the eye for k = 3/2. The inset shows the fitness fixed 
at the first step as a function of the initial fitness /o on uncorrelated fitness 
landscapes. The simulation data shown by open symbols are for L = 1000 
while the shaded ones are for L = 10000 when k = 2/3 and L = 2000 
when K — > and 3/2. For clarity, the data for k = 3/2 (2/3) is divided by 
10^ (200) for both L values and the data for two sequence lengths coincides 
for exponentially distributed fitnesses. In all the plots, the lines are guide to 
the eye. 
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Figure 2: The plot shows the average walk length per block for uniformly 
distributed fitnesses [k = —1) when the initial fitness is varied for different 
fitness correlations. The theoretical prediction f HOj) on uncorrelated fitness 
landscapes and fl60|) and fl62|) for /o < 1/2 and > 1/2 respectively on corre- 
lated fitness landscapes are shown by lines for L = 1000. The corresponding 
simulation data are shown by points. 
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2 4 



(l/K)ln(l+KfQ) 

Figure 3: The main plot shows the average walk length per block for expo- 
nentially distributed fitnesses {k — )■ 0) when the initial fitness is varied for 
different fitness correlations. The theoretical prediction f HOj) on uncorrelated 
fitness landscapes and fl^TI) for correlated fitness landscapes are shown by lines 
for L = 1000. The corresponding simulation data are shown by points. The 
inset shows the distribution of the selection coefficient obtained numerically 
(points) for L = 1000 which is compared against the result (198!) for infinitely 
long sequence (lines) when the fitnesses are distributed exponentially. 
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Figure 4: The main plot shows the average walk length for fitness distribution 
with K = 2/3 when the initial fitness is varied for different fitness correlations. 
The theoretical prediction fHOj) on uncorrelated fitness landscapes and fl7T!) 
for correlated fitness landscapes are shown by lines for Lb = 1000. The 
corresponding simulation data are shown as points. The inset shows the 
numerical data for the average fitness fixed during the walk as a function 
of L when k = 2/3, B = 1 and /o = 1 to support the expectation that it 
scales as L^'^~^ (refer (l89l) ). The lines are best fit to the curve of the form 

/j = Al(J)+A2(J)L2«-l. 
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(c) 



(d) 



Figure 5: The plots show the average selection coefficient (main) and the 
average fitness (inset) as a function of J on uncorrelated fitness landscapes 



for K = (a) -1, (b) 0, (c) 1/4, (d) 2/3 and L = 1000. The lines in Fig. [5(b) 
are the theoretical predictions fl99l) and flHTl) while in all the other cases, the 
lines are guide to the eye. The simulation data are shown by points. 
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Figure 6: The plot shows (scaled) average fitness difference between succes- 
sive steps as a function of the number of adaptive substitutions for L = 1000 
(open symbols) and /o = on uncorrelated fitness landscapes. The theoreti- 
cal prediction (!83l) is shown by lines and the simulation data by points. The 
filled boxes are the simulation data for L = 10000 and n = 1/4 to show that 
the agreement with theoretical prediction (1831) improves with increasing L. 
The standard deviation about the mean fitness difference is shown by error 
bars for a few representative points. 
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Figure 7: The plot shows (scaled) average fitness difference between succes- 
sive steps as a function of initial fitness on uncorrelated fitness landscape for 
various k and L = 1000 when J = 1 (main plot) and 2 (inset). The points 
give the simulation data and the lines are the theoretical prediction fl83|) . 
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Figure 8: The main plot shows (scaled) average fitness difference at the 
first step as a function of initial fitness for various k, on correlated fitness 
landscapes with B = 2 and Lb = 1000. The points show the simulation data 
while the lines are guide to the eye. The inset shows the fitness evolution 
during the course of the adaptive walk for exponentially distributed fitnesses, 
/o = 1 and Lb = 1000 when the fitness correlations are varied. The lines are 
the theoretical prediction (1104p and the points give the simulation data. 
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SUPPLEMENTARY INFORMATION 

SI Distribution of fixed beneficial mutation 




0.4 0.5 



(a) 




(b) 



Figure SI: Plot of Vj{f) for (a) k = -2 and (b) k = -1/2 to show the 
behavior of the most probable fitness when L = 1000. The lines show the 
simulation data for initial fitness /o = and points for /o > in both cases. 
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S2 Behavior of fitness difference for = 2/3 




J 

Figure S2: The plot shows (scaled) average fitness difference between succes- 
sive steps on uncorrelated fitness landscapes for k = 2/3 and L = 1000. The 
open symbols give fitness difference when /o = 1 while the shaded symbols 
give the A/i for various /q. 
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S3 Selection coefficient averaged over initial 
fitnesses 



If the initial conditions are random, on integrating f l95p over the fitness dis- 
tribution with K > 0, we obtain 

^lis) = j^dfopimMh) (S3.1) 

= s{l-n) df, ^\ (l + ^(, + l)/o)'^ (S3.2) 
Jo (1 + K'k) 

which gives = 2si(l + si)~^ for exponentially distributed fitnesses. 

Similarly for bounded distributions, we have 



^i(s) = s{l-K)l d/o__iiL_ (1 + ^(5 + !)/„)-— (S3.3) 



(1 + i^h 



In particular, for k = —1, we get 

^i(^) = 2 + ^ + 4.1nf-^) (S3.4) 

Note that for large s, the distribution ^'i(s) for exponentially and uniformly 
distributed fitnesses decay as and one may anticipate a diverging mean. 
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In fact, the mean is undefined for all k < 1. To see this, consider the integral 



I{s) = sf dx j^—-^ {1 + xsy (S3.5) 







(1 + x) 



foo 2 

= s-^/^ dx- -{x + s-Y" (S3.6) 

Jo U + ^> 

j-oo 2-a 

Jo [l + xY 



where we have assumed s ^ 1 and a = {1 + k) / n. We also have 

roo 2 

I{s) = dx-—^{l + x)-- (S3.8) 



-2 



(1 + XS~ 

POO 

/ dxx^il + x)-" (S3.9) 
Jo 



where the last integral is finite provided k < 1/2. Splitting the integral I{x) 
as follows: 



"l/s pi POO 

I{s) = s I + / + / (S3.10) 



J l/s Jl 

oo fl/s ^2 



s / dx{l + xs^'y^ + / dx- (S3. 11) 



1 ^0 

-1/k 



{i + xy 



= + 0(s-') (S3.12) 

Thus for K < 1/2, the distribution ^I/i(s) ~ while for 1/2 < k < 1, it 
decays as giving a diverging mean selection coefficient for all k < 1. 
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S4 Fitness difference on correlated fitness land- 



scapes 



>1.5 



|<+H 




Figure S3: The plot shows (scaled) average fitness difference between suc- 
cessive steps on correlated fitness landscapes with i? = 2 as a function of 
the number of adaptive substitutions when /o = 0. The symbols give the 
simulation data for L = 2000 while the lines are guide to the eye. 
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S5 Behavior of selection coefficient 




(b) 

Figure S4: Plot to show the average selection coefficient at each step for (a) 
K = 3/2 when correlations are absent and (b) on correlated fitness landscapes 
with B = 2 and /o = 0. The points show the simulation data while the lines 
are guide to the eye. The sequence length Lb = 1000 in all plots. 
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